Search CORE

106 research outputs found

Visualizing the Feature Importance for Black Box Models

Author: A Cutler
A Goldstein
B Bischl
B Bischl
B Gregorutti
C Molnar
C Strobl
E Štrumbelj
E Štrumbelj
G Casalicchio
J Vanschoren
L Breiman
LS Shapley
Michel Lang
RJ Serfling
S Cohen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/12/2018
Field of study

In recent years, a large amount of model-agnostic methods to improve the transparency, trustability and interpretability of machine learning models have been developed. We introduce local feature importance as a local version of a recent model-agnostic global feature importance method. Based on local feature importance, we propose two visual tools: partial importance (PI) and individual conditional importance (ICI) plots which visualize how changes in a feature affect the model performance on average, as well as for individual observations. Our proposed methods are related to partial dependence (PD) and individual conditional expectation (ICE) plots, but visualize the expected (conditional) feature importance instead of the expected (conditional) prediction. Furthermore, we show that averaging ICI curves across observations yields a PI curve, and integrating the PI curve with respect to the distribution of the considered feature results in the global feature importance. Another contribution of our paper is the Shapley feature importance, which fairly distributes the overall performance of a model among the features according to the marginal contributions and which can be used to compare the feature importance across different models.Comment: To Appear in Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2018, Dublin, Ireland, September 10 to 14, 2018, Proceedings, Part

arXiv.org e-Print Archive

Crossref

Quantifying Model Complexity via Functional Decomposition for Better Post-Hoc Interpretability

Author: AA Freitas
B Bischl
B Ustun
C Molnar
G Casalicchio
H Fanaee-T
H Schielzeth
J Fürnkranz
J Huysmans
J Knowles
JH Friedman
JH Friedman
K Hamidieh
M Philipp
P Cortez
Q Zhou
R Guidotti
Publication venue
Publication date: 23/09/2019
Field of study

Post-hoc model-agnostic interpretation methods such as partial dependence plots can be employed to interpret complex machine learning models. While these interpretation methods can be applied regardless of model complexity, they can produce misleading and verbose results if the model is too complex, especially w.r.t. feature interactions. To quantify the complexity of arbitrary machine learning models, we propose model-agnostic complexity measures based on functional decomposition: number of features used, interaction strength and main effect complexity. We show that post-hoc interpretation of models that minimize the three measures is more reliable and compact. Furthermore, we demonstrate the application of these measures in a multi-objective optimization approach which simultaneously minimizes loss and complexity

arXiv.org e-Print Archive

Crossref

Banse, K. and S.A. Piontkovsky (eds.). The mesoscale structure of the epipelagic ecosystem of the open Northern Arabian Sea

Author: Bischl B.
Blume H.
Botteck M.
Igel Christian
Martin R.
Rudolph G.
Rötter G.
Theimer W.
Vatolkin I.
Weihs C.
Publication venue: Consejo Superior de Investigaciones Científicas (España)
Publication date: 01/12/2006
Field of study

Book review: BANSE, K. and S.A. PIONTKOVSKY (eds.). – 2006. The mesoscale structure of the epipelagic ecosystem of the open Northern Arabian Sea. Universities Press, Hyderabad, India. 237 pp. ISBN 81 7371 496 7This book presents an extensive body of information obtained mainly from the thirtieth cruise of the R/V Professor Bodyanitsky to the Arabian Sea, carried out in 1990. It is part of a series published by the Universities Press, India, with the support of the Indian Academy of Sciences in Bangalore, whose aim is to narrow the English-Russian language gap concerning scientific literature on low-latitude oceansPeer reviewe

Directory of Open Access Journals

Copenhagen University Research Information System

Scientia Marina (E-Journal)

Publikationsserver der RWTH Aachen University

Digital.CSIC

Analyzing the BBOB Results by Means of Benchmarking Concepts

Author: Bischl .B.
Mersmann O.
Preuss M.
Trautmann H.
Weihs C.
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2015
Field of study

We present methods to answer two basic questions that arise when benchmarking optimization algorithms. The first one is: which algorithm is the "best" one? and the second one is: which algorithm should I use for my real-world problem? Both are connected and neither is easy to answer. We present a theoretical framework for designing and analyzing the raw data of such benchmark experiments. This represents a first step in answering the aforementioned questions. The 2009 and 2010 BBOB benchmark results are analyzed by means of this framework and we derive insight regarding the answers to the two questions. Furthermore, we discuss how to properly aggregate rankings from algorithm evaluations on individual problems into a consensus, its theoretical background and which common pitfalls should be avoided. Finally, we address the grouping of test problems into sets with similar optimizer rankings and investigate whether these are reflected by already proposed test problem characteristics, finding that this is not always the case.FWN – Publicaties zonder aanstelling Universiteit Leide

Leiden University Scholary Publications

Meta-learning for symbolic hyperparameter defaults

Author: Bischl B.
Gijsbers P.
Pfisterer F.
Rijn J.N. van
Vanschoren J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2021
Field of study

Computer Systems, Imagery and Medi

Leiden University Scholary Publications

Target‐oriented habitat and wildlife management: estimating forage quantity and quality of semi‐natural grasslands with Sentinel‐1 and Sentinel‐2 data

Author: Bischl B.
Gibson D. J.
Haykin S.
Isselstein J.
Justice C. O.
Müller‐Wilm U.
Pausas J. G.
Peeters A.
R Core Team
Riesch F.
Rosenthal G.
Ruß G.
Wachendorf M.
Zandler H.
Publication venue: 'Wiley'
Publication date: 24/02/2020
Field of study

Semi‐natural grasslands represent ecosystems with high biodiversity. Their conservation depends on the removal of biomass, for example, through grazing by livestock or wildlife. For this, spatially explicit information about grassland forage quantity and quality is a prerequisite for efficient management. The recent advancements of the Sentinel satellite mission offer new possibilities to support the conservation of semi‐natural grasslands. In this study, the combined use of radar (Sentinel‐1) and multispectral (Sentinel‐2) data to predict forage quantity and quality indicators of semi‐natural grassland in Germany was investigated. Field data for organic acid detergent fibre concentration (oADF), crude protein concentration (CP), compressed sward height (CSH) and standing biomass dry weight (DM) collected between 2015 and 2017 were related to remote sensing data using the random forest regression algorithm. In total, 102 optical‐ and radar‐based predictor variables were used to derive an optimized dataset, maximizing the predictive power of the respective model. High R2 values were obtained for the grassland quality indicators oADF (R2 = 0.79, RMSE = 2.29%) and CP (R2 = 0.72, RMSE = 1.70%) using 15 and 8 predictor variables respectively. Lower R2 values were achieved for the quantity indicators CSH (R2 = 0.60, RMSE = 2.77 cm) and DM (R2 = 0.45, RMSE = 90.84 g/m²). A permutation‐based variable importance measure indicated a strong contribution of simple ratio‐based optical indices to the model performance. In particular, the ratios between the narrow near‐infrared and red‐edge region were among the most important variables. The model performance for oADF, CP and CSH was only marginally increased by adding Sentinel‐1 data. For DM, no positive effect on the model performance was observed by combining Sentinel‐1 and Sentinel‐2 data. Thus, optical Sentinel‐2 data might be sufficient to accurately predict forage quality, and to some extent also quantity indicators of semi‐natural grassland

Crossref

Enlighten

Evaluation of random forest and ensemble methods at predicting complications following cardiac surgery

Author: A Pinto
B Bischl
B Lutkenhoner
B Sharif-Kashani
C Reis
F Roques
FJ Valverde-Albacete
H Hirose
HJ Geissler
JH Friedman
JW McEvoy
L Breiman
LI Kuncheva
M Galar
M-M Bouamrane
M-M Bouamrane
NV Chawla
O Pitkänen
P Kaul
P Knapik
Q Ji
S Barnett
S Eappen
SA Nashef
SH Walker
SR Moonesinghe
TK Ho
TK Wang
TKM Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/05/2019
Field of study

Cardiac patients undergoing surgery face increased risk of postoperative complications, due to a combination of factors, including higher risk surgery, their age at time of surgery and the presence of co-morbid conditions. They will therefore require high levels of care and clinical resources throughout their perioperative journey (i.e. before, during and after surgery). Although surgical mortality rates in the UK have remained low, postoperative complications on the other hand are common and can have a significant impact on patients’ quality of life, increase hospital length of stay and healthcare costs. In this study we used and compared several machine learning methods – random forest, AdaBoost, gradient boosting model and stacking – to predict severe postoperative complications after cardiac surgery based on preoperative variables obtained from a surgical database of a large acute care hospital in Scotland. Our results show that AdaBoost has the best overall performance (AUC = 0.731), and also outperforms EuroSCORE and EuroSCORE II in other studies predicting postoperative complications. Random forest (Sensitivity = 0.852, negative predictive value = 0.923), however, and gradient boosting model (Sensitivity = 0.875 and negative predictive value = 0.920) have the best performance at predicting severe postoperative complications based on sensitivity and negative predictive value

Crossref

University of Strathclyde Institutional Repository